skip to main content


Search for: All records

Creators/Authors contains: "Kurgan, Lukasz"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Abstract Motivation

    Development of bioinformatics methods is a long, complex and resource-hungry process. Hundreds of these tools were released. While some methods are highly cited and used, many suffer relatively low citation rates. We empirically analyze a large collection of recently released methods in three diverse protein function and disorder prediction areas to identify key factors that contribute to increased citations.

    Results

    We show that provision of a working web server significantly boosts citation rates. On average, methods with working web servers generate three times as many citations compared to tools that are available as only source code, have no code and no server, or are no longer available. This observation holds consistently across different research areas and publication years. We also find that differences in predictive performance are unlikely to impact citation rates. Overall, our empirical results suggest that a relatively low-cost investment into the provision and long-term support of web servers would substantially increase the impact of bioinformatics tools.

     
    more » « less
  2. Abstract

    Current predictors of DNA-binding residues (DBRs) from protein sequences belong to two distinct groups, those trained on binding annotations extracted from structured protein-DNA complexes (structure-trained) vs. intrinsically disordered proteins (disorder-trained). We complete the first empirical analysis of predictive performance across the structure- and disorder-annotated proteins for a representative collection of ten predictors. Majority of the structure-trained tools perform well on the structure-annotated proteins while doing relatively poorly on the disorder-annotated proteins, and vice versa. Several methods make accurate predictions for the structure-annotated proteins or the disorder-annotated proteins, but none performs highly accurately for both annotation types. Moreover, most predictors make excessive cross-predictions for the disorder-annotated proteins, where residues that interact with non-DNA ligand types are predicted as DBRs. Motivated by these results, we design, validate and deploy an innovative meta-model, hybridDBRpred, that uses deep transformer network to combine predictions generated by three best current predictors. HybridDBRpred provides accurate predictions and low levels of cross-predictions across the two annotation types, and is statistically more accurate than each of the ten tools and baseline meta-predictors that rely on averaging and logistic regression. We deploy hybridDBRpred as a convenient web server at http://biomine.cs.vcu.edu/servers/hybridDBRpred/ and provide the corresponding source code at https://github.com/jianzhang-xynu/hybridDBRpred.

     
    more » « less
  3. Free, publicly-accessible full text available November 1, 2024
  4. Free, publicly-accessible full text available November 1, 2024
  5. Abstract

    Intrinsic disorder in proteins is relatively abundant in nature and essential for a broad spectrum of cellular functions. While disorder can be accurately predicted from protein sequences, as it was empirically demonstrated in recent community-organized assessments, it is rather challenging to collect and compile a comprehensive prediction that covers multiple disorder functions. To this end, we introduce the DEPICTER2 (DisorderEd PredictIon CenTER) webserver that offers convenient access to a curated collection of fast and accurate disorder and disorder function predictors. This server includes a state-of-the-art disorder predictor, flDPnn, and five modern methods that cover all currently predictable disorder functions: disordered linkers and protein, peptide, DNA, RNA and lipid binding. DEPICTER2 allows selection of any combination of the six methods, batch predictions of up to 25 proteins per request and provides interactive visualization of the resulting predictions. The webserver is freely available at http://biomine.cs.vcu.edu/servers/DEPICTER2/

     
    more » « less
  6. Abstract

    The DescribePROT database of amino acid-level descriptors of protein structures and functions was substantially expanded since its release in 2020. This expansion includes substantial increase in the size, scope, and quality of the underlying data, the addition of experimental structural information, the inclusion of new data download options, and an upgraded graphical interface. DescribePROT currently covers 19 structural and functional descriptors for proteins in 273 reference proteomes generated by 11 accurate and complementary predictive tools. Users can search our resource in multiple ways, interact with the data using the graphical interface, and download data at various scales including individual proteins, entire proteomes, and whole database. The annotations in DescribePROT are useful for a broad spectrum of studies that include investigations of protein structure and function, development and validation of predictive tools, and to support efforts in understanding molecular underpinnings of diseases and development of therapeutics. DescribePROT can be freely accessed at http://biomine.cs.vcu.edu/servers/DESCRIBEPROT/.

     
    more » « less
  7. Abstract

    There are over 100 computational predictors of intrinsic disorder. These methods predict amino acid‐level propensities for disorder directly from protein sequences. The propensities can be used to annotate putative disordered residues and regions. This unit provides a practical and holistic introduction to the sequence‐based intrinsic disorder prediction. We define intrinsic disorder, explain the format of computational prediction of disorder, and identify and describe several accurate predictors. We also introduce recently released databases of intrinsic disorder predictions and use an illustrative example to provide insights into how predictions should be interpreted and combined. Lastly, we summarize key experimental methods that can be used to validate computational predictions. © 2023 Wiley Periodicals LLC.

     
    more » « less